User blog:LittlePeng9/Hard-coding long binary strings
Today I have discovered a way of hard-coding binary strings into Turing machines with asymptotically fewer states. More precisely, I've found the following: :For any string of \(2^k\cdot l\) bits, there is a Turing machine with three colors (0, 1 and blank) and \(2^k+k+4\cdot 2^l+1\) states which, on blank input, writes that string of bits and halts with its head on the first character of the string. By simulating a three-color Turing machine with a two-color one and adding a few more steps to take care of deciphering the string, we can show the following (I don't bother with establishing the constants): :For any \(l\) and large enough \(k\) (depending on \(l\)), for any string of \(2^k\cdot l\) there is a Turing machine with at most \(2^k\) states which, on blank input, halts with its head at the beginning of this string written on the tape. Application The idea occured to me when we were discussing something along the lines of this on IRC. Using the above result, we can finally compare the two functions: \(BB(n)\) eventually dominates \(BB_C(n)\). To see why this is the case, we have to take some Turing machine which is an interpreter of C. Let's take such a Turing machine which, as an input, takes the ASCII code of some program written in C with the read-write head positioned at the leftmost bit of the code, and as an output prints a block of ones of length given by the program's output (we don't care about non-number outputs). Let's say this TM has \(s\) states. Consider some large \(k\). For any C code of length at most \(3\cdot 2^k\), hence with ASCII code's length at most \(24\cdot 2^k\), there is a TM with at most \(2^k\) states which halts after having written that code (the result above with \(l=24\)). Let's create now a Turing machine with \(2^k+s\) states which first writes this string and then applies the interpreter like in the above paragraph. The output of this TM has as many ones as the C program returns. This way we quickly establish an inequality \(BB(2^k+s)\geq BB_C(3\cdot 2^k)\). Finally, for large \(n\), there is an integer \(k\) such that \(2^k+s\leq n\leq 3\cdot 2^k\). Since both \(BB\) and \(BB_C\) are nondecreasing, \(BB(n)\geq BB(2^k+s)\geq BB_C(3\cdot 2^k)\geq BB_C(n)\). Proof of the result Here is a proof-of-concept Turing machine. The string is encoded using the last characters in lines 15-22. Here I present the general idea, below is a construction. Idea The simplest way to hard-code a string of \(n\) bits is to use \(n\) states, say state_1,...,state_n, where state_i would transition to state_(i+1) writing the i-th bit. This, however, is quite wasteful, since we are not using up many possible interactions between those states. What could be a better idea is encoding blocks of bits. For example, we could have states state_a_b, where a denotes the index of a block and b would be the index of the bit in the block. This idea doesn't improve things, since if we have \(k\) blocks of \(l\) bits, then we hard-code \(kl\) bits into \(kl\) states. The reason the above idea doesn't give any improvement is, in a way, that we use the same states to count which block we are in and which bit we are writing. If we somehow separate these, then we can to get to something like \(k+2^l\) states instead, which is much better than \(kl\) if \(k\) is large. The way we separate the two is by using a counter which keeps track of which block we are writing. After reading the counter, we move to the state telling us how the block looks like, and then we basically just go and write the block. Up to some constants, it gives us around \(k+2^l\) states. Construction The tape has two components: a binary counter of length \(k\), initially consisting of zeros and easily constructed using \(k\) steps (setup states; note that if we don't want stationary transitions, we can move right from setup_k to up_2, the result is the same), and the "construction site" where the output is written, initially blank. Next we read off the counter which characters we now write. The states responsible for that are read_a_b for a between \(1\) and \(k-1\) and b a binary string of length a-1; this state means that we are reading a-th bit of the counter and so far we have read the string b. After reading the \(k\)-th bit, we know which step we are at, so we know what string of \(l\) bits we want to write; we go to state move_l_c, where c is a string of \(l\) bits. Counting we get exactly \(2^k-1\) states. The move_l_c states are somewhat auxilary, they move the head off the counter to the construction site; as soon as we hit the blank separating the counter and the construction site we move to the write_l_c state. Here we have an array of write_a_b states, where a is between \(1\) and \(l\) and b is a string of length a; this state means we have a bits left to write, namely b. Transitions are again easy to work out, and this gives us \(2\cdot(2^l-1)\) states. Last thing we have to do is return to the counter, increment it and start over, which is done using states return, up_1 and up_2. When the counter overflows, we remove it using state clear and then we are ready to go. Getting the state count is a matter of addition. Improving the result I have realized that if we used a unary counter, then the machine will get simpler and might actually end up having a few states fewer. Additionally, I think unary counter makes it much easier to implement the whole thing using two colors, without having to invoke a third color. I am not going to work this out in details, since I don't really care about exact numbers; to get a precise bound in the above application we would need to know how many states a C interpreter would have anyways, so I will keep it as is. Category:Blog posts